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Estimation of multiple parameters is a common task in sig¬ 
nal processing. The Cramer-Rao bound (CRB) sets a statistical 
lower limit on the resulting errors when estimating parameters 
from a set of random observations. It can be understood as 
a fundamental measure of parameter uncertainty mi, G). As 
a general example, suppose 9 denotes the vector of sought 
parameters and that the random observation model can be 
written as 

y = xg + w, (1) 

where xg is a function or signal parameterized by 9 and w is 
a zero-mean Gaussian noise vector. Then the CRB for 9 has 
the following notable properties: 

i) For a fixed 9, the CRB for 9 decreases as the dimension 
of y increases. 

ii) For a fixed y, if additional parameters 9 are estimated 
then the CRB for 9 increases as the dimension of 9 
increases. 

iii) If adding a set of observations y requires estimating 
additional parameters 9 , then the CRB for 9 decreases 
as the dimension of y increases, provided the dimension 
of 9 does not exceed that of y 0. This property implies 
both i) and ii) above. 

iv) Among all possible distributions of w with a fixed covari¬ 
ance matrix, the CRB for 9 attains its maximum when w 
is Gaussian, i.e., the Gaussian scenario is the ‘worst-case’ 
for estimating 9 |4l-l6l. 

In this lecture note, we show a general property of the CRB 
that quantifies the interdependencies between the parameters in 
9. The presented result is valid for more general models than 
0 and also generalizes the result in 0 to vector parameters. 
It will be illustrated via two examples. 


Fet 1(9) = In piy. 9) denote the log-likelihood function 
and let 9 be any unbiased estimator. Then the mean square 
error (MSE) matrix Pg = E[(0 — 9)(9 — 9)*] is bounded 
from below by the inverse of the Fisher information matrix 
Jg = — E \dgl(9)\, where denotes the second-order dif¬ 
ferential or Faplacian operator with respect to 9. That is, 
Pg ^ Jg 1 , assuming from hereon that Jg is nonsingular. This 
is the Cramer-Rao inequality G), j8), 0. 

The determinant of the MSE-matrix, |Pg|, is a scalar 
measure of the error magnitude. For unbiased estimators, |Pg| 
equals the ‘generalized variance’ of errors m. By defining 
CRB(0) = |Jg |, the generalized error variance is bounded 
by 

|P e -| >CRB(0). 

In the following we are interested in subvectors or elements 
of 9. Fetting 9 = [a T /3 T ] T , we can write the Fisher 
information matrix in block-form, 
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IV. Main result 

Fet a and b be two random vectors. Two useful rules in 
probability theory are the chain rule 

f(a,b) =p(a|b)p(b) (3) 


and Bayes rule 


P( a) 


pQ) 

p(b|a) 


p(a|b). 


(4) 


Now consider two parameter vectors a and (3. When both 
are unknown, their joint Cramer-Rao bound is given by 


CRB(a,/3) 



(5) 


I. Relevance 

In probability theory, the chain rule and Bayes rule are 
useful tools to analyze the statistical interdependence between 
multiple random variables and to derive tractable expressions 
for their distributions. In this lecture note, we provide analogs 
of the chain rule and Bayes rule for the Cramer-Rao bound as¬ 
sociated with multiple parameters. The results are particularly 
useful when estimating parameters of interest in the presence 
of nuisance parameters. 

II. Prerequisities 

The reader needs basic knowledge about linear algebra, 
elementary probability theory, and statistical signal processing. 

III. Preliminaries 


The bound for a with known (3 is simply 


CRB(a|/3) = IJ- 1 


( 6 ) 


and the bound for a with unknown (3 is 


CRB (a) 






(7) 


(0 follows by evaluating the inverse in 0 and extracting 
the upper-left block corresponding to a.) Eqs. 0 and 0 
are the respective CRB analogs of conditional and marginal 
distributions for random variables. 

By applying the Schur determinant formula (8), [IT] 



|Ja||J/3 J/3 ckJq | 

|J/3||Ja J/3a |, 


We will consider a general scenario in which we observe an 
nxl random vector y. Its probability density function (pdf) 
p( y; 9) is parameterized by a k x 1 deterministic vector 9. 
The goal is to estimate 9, or subvectors of 9 , given y. 


along with |J 1 | = |J| to 0, 0 and 0, we can now 
state the Cramer-Rao bound analogs of the chain rule 0, 

CRB (a, (3) =CRB(a|/3)CRB(/3) 


( 8 ) 
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and of Bayes rule ©, 


CRB (a) 


CRB(/3) 

CRB(/3|a) 


CRB (a |/3) 


(9) 


The results are of course symmetric, i.e., one can interchange 
a and (3. 

From ([Sj we see that the joint error bound for a and (3 
equals the error bound for a, when (3 is known, multiplied 
by the error bound for (3. More interestingly, ([9} tells us that 
the error bound for a. is equal to the bound for a when (3 is 
known, multiplied by a factor, viz. CRB(/3)/CRB(/3|a) > 1, 
that quantifies the influence of (3 on one’s ability to estimate 
a. 

Remark 1. The rules can be applied to cases with any number 
of additional parameters, besides a and (3. Consider for 
instance the case of a, (3 and 7 , where 7 is an unknown 
nuisance parameter. Then applying the chain rule twice yields 


CRB(a,/3,7) = CRB(7|a, /3)CRB(a|/3)CRB(/3) 
= CRB(7|a 1( 3)CRB(/3|a)CRB(a) 


( 10 ) 


where the factors without 7 signify that the nuisance parameter 
is unknown. Combining the two expressions in © yields 
the analog of Bayes rule © for any number of additional 
parameters. 

The joint error bound for a set of parameters 07 , <* 2 , < 23 ,... 
can be similarly decomposed by a recursive application of the 
chain rule in order to analyze their interdependency and its 
impact on estimation. 

Remark 2. The CRB analog of Bayes rule © generalizes the 
result in 0 which concerns only scalar parameters a and /3 
amid a vector of nuisance parameters 7 . Our proof of © is 
also more direct than in 0 . 

Remark 3. These results are also applicable to the posterior, or 
Bayesian, Cramer-Rao bound (PCRB), in which 6 is modeled 
as a random variable with a prior distribution. The PCRB is 
valid for the entire class of estimators 0, whether biased or 
not 0 . The posterior Cramer-Rao inequality is then — 
J^ 1 , where J# = — E[<9g lnp(y, 0)] is the Bayesian Fisher 
information matrix, p(y, 6) is the joint pdf and the expectation 
is with respect to this pdf. Letting 6 = [a T /3 T ] T , the matrix 
can be partitioned correspondingly. 



and thereby the results ®. © and (ITOt can be applied to the 
PCRB as well. 


V. Examples 

Next, we illustrate via two examples how a decomposition 
like © can be used for analysis. The examples show that, by 
quantifying the impact of nuisance parameters, it is possible to 
study the trade-off between the gain of obtaining them through 
independent side information versus estimating them jointly 
with the parameters of interest. 


A. Linear mixed model 
Consider a linear model 

y = Ax + Bz + w G K", 

where w is Gaussian noise with covariance matrix ul, and 
x G and z G R. kz are unknown parameters. The matrices 
are known and rank([A B]) = k x +k z < n, which implies that 
the parameters x and z are embedded into two distinct range 
spaces, TZ(A) and 7CB), respectively. Here 'R(A) denotes 
the linear subspace spanned by the columns of A. Under these 
conditions the joint Fisher information matrix equals © 
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From this expression, we see that the bound for v is in¬ 
dependent of that for x and z. That is, CRB(x, z,v) = 
CRB(x, zjCRB ( (/). This is a CRB analog of the independence 
for random variables. Furthermore, we obtain CRB(z|x) = 

|Jr 1 ! = |u(B t B)- 1 | = v kz |B t B | -1 and CRB(z) = |(J Z - 
= KB t B - B r A(A r A)- 1 A T B)~ 1 \ = 
v kz B 1 n|B| “ 1 , where IIis the projector onto the orthog¬ 
onal complement of TZ(A). 

The increase in the error bound for x due to the lack of 
information about z can now be quantified using © 

loT T3 I 

CRB(x) = — - -r-^CRB(xlz), (11) 

v ’ |B T niB| v 1 ' 

where the factor B 1 IT a B measures the alignment of 'JZ(A) 
and TZ(B). When the range spaces are orthogonal we have 
that |B T n^B| = |B t B|, and by (ITTI) the bound for x is 
unaffected by one’s ignorance about z. In scenarios where 
it is possible to obtain z through additional side-information 
or calibration, instead of estimation, the cost can be weighed 
against the reduction of the error bound for x by the given 
factor |B T niB|/|B T B|. 

This example has illustrated the interdependencies between 
the unknown parameters x, z and v. Next we consider an ex¬ 
ample where the unknown parameters become asympotically 
independent as the number of samples n grows large. 


B. Sine-wave fitting 

Sine-wave fitting is a problem that arises in system testing, 
for example of waveform recorders, and the IEEE Standard 
1057 formalizes procedures to do so ( 112 and references 
therein). 

Consider n uniform samples of a sinusoid in noise 

y{k) = a sin (utk + fi) + C + w(k), 

where w(k) is a Gaussian white noise process with variance 
v and k = 0, . .. ,n — 1. The amplitude a and phase <fi of 
the sinusoidal signal, along with the offset C, are of interest. 
In certain cases the frequency oj of the test signal may be 
obtained separately from the estimation of ct, <j> and C. For 
simplicity, we first consider an alternative parameterization of 
the sinusoid, namely: a sin (utk+fi) = Acos(ojk)+B sin(wfc), 
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where A = asin(( f>) and B = acos(</>). The parameters are 
9=[ABClov] t . 

As shown in El, the Fisher information matrix can be 
decomposed into Jg = Jg + Jg, where 
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contains the dominant terms and Jg contains the remainder, 
so that J^ 1 ~ jg 1 for large n. Using this approximation we 
now analyze the bounds for A, B and C by application of ©. 

First, let 9' = [A B C v] T be the parameter vector without 
co. Then 


CRB(w) = | Ju, — 1 


„ f{A 2 + B 2 )n 3 ( A 2 + B 2 )n 3 

V 3 4 

2v 12 
Y 3 (A 2 + B 2 )' 


-l 


Second, let 9" = [B C v be the parameter vector without 
co and A. Then 

CRB(w|A) = \J U - J u9 » J^Js^r 1 

0 /( A 2 + B 2 )n 3 A 2 n 3 \ _1 

~ V 3 T) 

_ 2v 12 

~ Y 3 (A 2 + B 2 ) + 3 B 2 ' 


Thus CRB(w)/CRB(w|A) = 1 + 3 B 2 /(A 2 + B 2 ) <E [1,4], 
Note that Jgi and Jg>> are diagonal, making their inverses 
particularly easy to compute. Applying © we obtain 


CRB (A) ~ (l + CRB(A| W ) 

/ 342 \ 

CRB (B) ~\1 + CRB(f3|w) 

CRB(C) ~CRB(C|w), 


where the bounds for B and C are derived in a similar 
manner as for A. This shows that the bound for the offset 
C becomes independent of the knowledge of the frequency co 
as n increases, while the bounds for A and B are inflated by 
factors ranging between 1 and 4 due to one’s ignorance about 
co. 

When considering the original parameterization 9 = 

[a 1 j> C u> v\ T there exists an invertible relation, 9 = 
g (9) = [VA 2 + B 2 arctan(^) Cw u] T . Therefore we have 
that J^ 1 = dgg(9)J^ 1 dgg(9) T (2), where dg denotes the 


first-order differential or gradient with respect to 9 and 


sin <f> cos (j) 0 


deg {9) = 


COS (j) 

a. 

0 


Exploiting the approximation J g 1 
obtains El 



once again, one 


CRB (a) ~ CRB(a|w) 
CRBO) ~ 4CRB(</>|w). 


This shows that in large samples the error bound for the 
amplitude a also becomes independent of knowledge about 
the frequency w, whereas not knowing co inflates the bound 
for the phase <f> by a factor of 4. 

For large data records, the cost of pre-calibrating the fre¬ 
quency can be weighed against a reduction of the error bound 
for the phase, while the error bounds for the amplitude and 
offset will not be improved. 


VI. What we have learned 

An analog of Bayes rule for the Cramer-Rao bound has 
been derived. This analogous rule enables a formalized de¬ 
composition and quantification of the mutual dependencies 
between multiple unknown parameters. The use of the rule 
was illustrated in two estimation problems. 
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